transit_service_analyst demo¶

Import libraries¶

In [14]:
%matplotlib inline
import pandas as pd
import geopandas as gpd
import transit_service_analyst as tsa
import plotly.express as px
import shapely.geometry
import numpy as np
import plotly.offline as pyo
import plotly.io as pio
pio.renderers.default='notebook'
#import nbformat
#import plotly.io as pio

Create an instance of transit_service_analyst using a gtfs feed on file¶

In [15]:
path = r'R:/e2projects_two/Angela/transit_routes_2018/latest_combined'
transit_analyst = tsa.load_gtfs(path, '20180417')
C:\Users\scoe\Anaconda3\envs\geopandas\lib\site-packages\transit_service_analyst\gtfs_service.py:116: DtypeWarning:

Columns (6) have mixed types. Specify dtype option on import or set low_memory=False.

C:\Users\scoe\Anaconda3\envs\geopandas\lib\site-packages\geopandas\array.py:275: ShapelyDeprecationWarning:

The array interface is deprecated and will no longer work in Shapely 2.0. Convert the '.coords' to a numpy array instead.

C:\Users\scoe\Anaconda3\envs\geopandas\lib\site-packages\geopandas\array.py:275: ShapelyDeprecationWarning:

The array interface is deprecated and will no longer work in Shapely 2.0. Convert the '.coords' to a numpy array instead.

The function get_lines_gdf() returns a GeodatFrame for each unique line by route, as determined by stop pattern. A trip_id representing each stop pattern is used as the unique ID and is called rep_trip_id. Show the first five rows:¶

In [16]:
transit_analyst.get_lines_gdf().head()
Out[16]:
shape_id geometry route_id rep_trip_id service_id trip_headsign direction_id block_id wheelchair_accessible bikes_allowed ... peak_flag fare_id agency_id route_short_name route_long_name route_type route_color route_text_color route_desc route_url
0 CT_510:74 LINESTRING (-122.19550 47.97913, -122.19593 47... CT_510 CT_7554558 1 Seattle 5th Ave 1.0 b_KPOB-CS:221:0:Weekday:1:18MAR:85001:12345 1.0 1.0 ... NaN NaN 29 510 Everett - Seattle 3 0070c0 ffffff NaN NaN
1 CT_510:75 LINESTRING (-122.19773 47.97482, -122.19777 47... CT_510 CT_7554559 1 Seattle 5th Ave 1.0 b_KPOB-CS:221:0:Weekday:1:18MAR:85001:12345 1.0 1.0 ... NaN NaN 29 510 Everett - Seattle 3 0070c0 ffffff NaN NaN
2 CT_512:63 LINESTRING (-122.32897 47.59883, -122.32897 47... CT_512 CT_7554560 1 Everett Station via Lynnwood 1.0 b_KPOB-CS:221:0:Weekday:1:18MAR:85001:12345 1.0 1.0 ... NaN NaN 29 512 Everett - Seattle 3 0070c0 ffffff NaN NaN
3 CT_421:36:00 LINESTRING (-122.33432 47.61585, -122.33437 47... CT_421 CT_7554561 1 Marysville 0.0 b_KPOB-CS:221:0:Weekday:1:18MAR:84050:12345 1.0 1.0 ... NaN NaN 29 421 Marysville - Seattle 3 0070c0 ffffff NaN NaN
4 CT_511:73 LINESTRING (-122.32897 47.59883, -122.32897 47... CT_511 CT_7554563 1 Ash Way P&R Lynnwood TC 0.0 b_KPOB-CS:221:0:Weekday:1:18MAR:85015:12345 1.0 1.0 ... NaN NaN 29 511 Ash Way - Seattle 3 0070c0 ffffff NaN NaN

5 rows × 21 columns

Function to map line geometry¶

In [17]:
# Functiont to plot lines
def create_line_map(df, name_col):
    if len(df) == 0:
        print ('Empty DF. No lines to map!')
    else:
        lats = []
        lons = []
        names = []

        for feature, name in zip(df.geometry, df[name_col]):
            if isinstance(feature, shapely.geometry.linestring.LineString):
                linestrings = [feature]
            elif isinstance(feature, shapely.geometry.multilinestring.MultiLineString):
                linestrings = feature.geoms
            else:
                continue
            for linestring in linestrings:
                x, y = linestring.xy
                lats = np.append(lats, y)
                lons = np.append(lons, x)
                names = np.append(names, [name]*len(y))
                lats = np.append(lats, None)
                lons = np.append(lons, None)
                names = np.append(names, None)


        fig = px.line_mapbox(lat=lats, lon=lons, hover_name=names,
                     mapbox_style="open-street-map", zoom=8)
        fig.show()

Let's map one line for each route:¶

In [18]:
# Call function to map routes
geo_df = transit_analyst.get_lines_gdf()
create_line_map(geo_df.groupby('route_id').first().reset_index(), 'route_short_name')

The field transit_type is included so, for example, we can plot all routes that are not bus.¶

In [19]:
not_bus = geo_df[geo_df['route_type']!=3]
create_line_map(not_bus, 'route_short_name')

The function get_tph_by_line() returns a DataFrame with hourly frequencies for each rep_trip. Show the first 5 rows:¶

In [20]:
transit_analyst.get_tph_by_line().head()
Out[20]:
rep_trip_id hour_2 hour_3 hour_4 hour_5 hour_6 hour_7 hour_8 hour_9 hour_10 ... hour_21 hour_22 hour_23 hour_24 hour_25 hour_26 hour_27 hour_28 route_id direction_id
0 CT_7554558 0.0 0.0 3.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 CT_510 1.0
1 CT_7554559 0.0 0.0 0.0 4.0 6.0 4.0 4.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 CT_510 1.0
2 CT_7554560 0.0 0.0 0.0 3.0 4.0 4.0 4.0 4.0 4.0 ... 3.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 CT_512 1.0
3 CT_7554561 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 CT_421 0.0
4 CT_7554563 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 CT_511 0.0

5 rows × 30 columns

Use get_tph_by_line() to get all rep_trips that have frequent service of 4 or more trips per hour during peak hours.¶

In [21]:
list_of_hours = ['hour_6', 'hour_7', 'hour_8', 'hour_15', 'hour_16', 'hour_17']
min_tph = 4
freq = transit_analyst.get_tph_by_line()
# create binary column for each hour
cols = []
for hour in list_of_hours:
    #print (len(freq))
    freq = freq[freq[hour]>=min_tph]

freq_rep_trips_gdf = geo_df[geo_df['rep_trip_id'].isin(freq['rep_trip_id'])]

print (f'There are {len(freq)} lines that have {min_tph} trips or more per hour.')
There are 53 lines that have 4 trips or more per hour.
In [22]:
# map them
create_line_map(freq_rep_trips_gdf, 'route_short_name')

The get_line_stops_gdf() function returns a GeoDataFrame consisting of the sequence of stops for each route returned by get_lines_gdf(). Use it and stops property to get the stops associated with the frequent routes.¶

In [23]:
# first get route stops for these lines:
route_stops = transit_analyst.get_line_stops_gdf()
route_stops = route_stops[route_stops['rep_trip_id'].isin(freq['rep_trip_id'])]

# Now get the unique stops:
unique_stops = route_stops[route_stops['stop_id'].isin(transit_analyst.stops['stop_id'])]


fig = px.scatter_mapbox(unique_stops,
                        lat=unique_stops['stop_lat'],
                        lon=unique_stops['stop_lon'],
                        hover_name="stop_id",
                        zoom=8,
                        mapbox_style="open-street-map")
fig.show()

The function get_tph_at_stops() function returns a DataFrame of total trips at each stop, independent of route. Let's use it to map stops by total number of trips for the AM peak:¶

In [24]:
list_of_hours = ['hour_6', 'hour_7', 'hour_8']
trips_at_stops = transit_analyst.get_tph_at_stops()
trips_at_stops['am_trips'] = trips_at_stops['hour_6'] + trips_at_stops['hour_7'] + trips_at_stops['hour_8']

# join to stops
trips_at_stops = trips_at_stops.merge(transit_analyst.stops, how = 'left', on = 'stop_id')

# get the stops with the most trips:
max_trips = trips_at_stops['am_trips'].max()
test = trips_at_stops[trips_at_stops['am_trips']==max_trips]

if len(test)==1:
    stop_id = test['stop_id'].values[0]
    number_of_trips = test['am_trips'].values[0]
    print (f'stop_id {str(stop_id)} has the most AM trips with {number_of_trips}')
    
stop_id KC_420 has the most AM trips with 263.0
In [25]:
# limit to 1000 stops
if len(trips_at_stops) > 1000:
    trips_at_stops.sort_values(by='am_trips', ascending=False, inplace = True)
    trips_at_stops = trips_at_stops.head(1000)

fig = px.scatter_mapbox(trips_at_stops,
                        lat=trips_at_stops['stop_lat'],
                        lon=trips_at_stops['stop_lon'],
                        hover_name="stop_id",
                        zoom=8,
                        mapbox_style="open-street-map",
                        size = 'am_trips')

fig.show()

Let's use GeoPandas buffering function to buffer these stops by 1/4 mile.¶

In [26]:
gdf = trips_at_stops.copy()
gdf = gpd.GeoDataFrame(gdf, geometry=gpd.points_from_xy(gdf.stop_lon, gdf.stop_lat))
# set initial geographic coordinate system.
gdf =gdf.set_crs(4326)
# need to switch to a projected system (state plane) to do buffer.
gdf = gdf.to_crs(2285)
gdf.geometry = gdf.geometry.buffer(1320)
# switch back to map
gdf = gdf.to_crs(4326)
gdf.set_index('stop_id', inplace = True)

fig = px.choropleth_mapbox(gdf,
                           geojson=gdf.geometry,
                           locations=gdf.index,
                           color="am_trips",
                           center={"lat": 47.6, "lon": -122.7073},
                           mapbox_style="open-street-map",
                           zoom=8)
fig.show()
C:\Users\scoe\Anaconda3\envs\geopandas\lib\site-packages\geopandas\array.py:275: ShapelyDeprecationWarning:

The array interface is deprecated and will no longer work in Shapely 2.0. Convert the '.coords' to a numpy array instead.